Advanced Geospatial Data Processing for Social Scientists
Dennis Abel & Stefan Jünger
2025-04-29
Day
Time
Title
April 28
10:00-11:15
Introduction
April 28
11:15-11:30
Coffee Break
April 28
11:30-13:00
Raster data in R
April 28
13:00-14:00
Lunch Break
April 28
14:00-15:15
Raster data processing
April 28
15:15-15:30
Coffee Break
April 28
15:30-17:00
Graphical display of raster data in maps
April 29
10:00-11:15
Datacube processing I
April 29
11:15-11:30
Coffee Break
April 29
11:30-13:00
Datacube processing II & API access
April 29
13:00-14:00
Lunch Break
April 29
14:00-15:15
Data integration and linking (with survey data)
April 29
15:15-15:30
Coffee Break
April 29
15:30-17:00
Outlook and open session with own application
Survey data: ISSP Environment (1993-2020)
After downloading the ISSP data file from the website, we placed it in our ./data folder. We have already prepared a script in the folder ./R called prep_issp.r to prepare the data, which we call using source().
source("./R/prep_issp.R")head(issp, 10)
year country concern
1 1993 Australia 4
2 1993 Australia 4
3 1993 Australia 5
4 1993 Australia 3
5 1993 Australia 3
6 1993 Australia 3
7 1993 Australia 3
8 1993 Australia 4
9 1993 Australia 3
10 1993 Australia 3
Climate concern in 2020
There’s a nice item in the ISSP where respondents could evaluate whether
“A rise in world’s temperature is (dangerous/ not dangerous) for environment”1
EO indicators
We want to investigate whether temperature anomalies in the year of the survey, compared to a long-running average, are associated with climate change concerns.
Indicator: Temperature - annual average
Intensity: Anomaly (mean deviation)
Focal time period: 2020
Baseline period: 1961-1990
Spatial buffer: Country
ERA5 data
The ERA5-Land Reanalysis from the Copernicus Climate Change Service is a suitable data product for this temperature indicator. It records observations on air temperature at 2 meters above the surface from 1950 onwards, has a spatial resolution of 0.1x0.1 degrees, and has global spatial coverage. We can apply our conceptualization exactly to these data.
Data access and preparation
To access the data, we need an ECMWF account. Utilizing the ecmwfr package, we can access the data directly in R. Given that we want to aggregate the data at the country level, we first load country vector data and download the data according to the spatial extent of the countries included in the survey. The ISSP has a diverse membership from North and South America, Europe, Africa, and Asia. Thus, we can work with a global spatial extent when downloading the EO indicator.
We also need some packages to load and prepare the world map and process the raster files (rnaturalearth, sf, terra, and tidyverse). We also need the keyring package to safely store our ECMWF-API key.
A final step before we can access the data from the Copernicus API is to store our API key. The function automatically retrieves the key by setting it to "wf_api_key".
# Store as environment variableSys.setenv(WF_API_KEY ="MY-API-KEY")api_key <-Sys.getenv("WF_API_KEY")keyring::key_set_with_value(service ="wf_api_key", password = api_key)
Now we can access the data. We loop the download over the four years of the survey program (1993, 2000, 2010, 2020) to create four separate files.
Downloading the data
# API access looped over four yearsfor (yr inc("1993", "2000", "2010", "2020")) {# Create file names which include year file_name <-paste0("era5_temperature", yr, ".grib")# Specify API request request <-list(data_format ="grib",variable ="2m_temperature",product_type ="monthly_averaged_reanalysis",time ="00:00",year = yr,month =c("01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12" ),area =c(90, -180, -90, 180),dataset_short_name ="reanalysis-era5-land-monthly-means",target = file_name )# Download data from C3S file_path <- ecmwfr::wf_request(request = request,transfer =TRUE,path ="./data/EO_data/C3S_data",verbose =FALSE )}
Importing the data
Importing the data into R should feel natural to us right now. We simply use terra::rast() for this endevaour.
Let’s inspect the datacube for 2020 and plot the first layer of the 2020 datacube (January 2020). The file’s attributes tell us information on the dimensions (number of rows, columns, and layers), the resolution, spatial extent, the coordinate reference system, units, and time points.
Now, we can aggregate the monthly values by year and country. If necessary, we will check that our country polygons and the raster files have the same CRS and align.
for (yr inc("1993", "2000", "2010", "2020")) { temp_data <-get(paste0("temp_", yr))# Check CRS of both datasets and adjust if necessaryif(!identical(terra::crs(world), terra::crs(temp_data))) { world <- world |> sf::st_transform(sf::st_crs(temp_data)) }# Collapse the month layers into one layer by averaging across months annual_values <- terra::app(temp_data, fun = mean, na.rm =TRUE, cores =4)# Aggregate by country country_values <- terra::extract( annual_values, world,fun = mean,na.rm =TRUE )# Add values to shapefile world[paste0("temp_", yr)] <- country_values[, 2]}
The result
We now have our country polygon vector file with yearly mean temperatures for each survey year.
We also aggregate these data at the country level and add them to our country polygon vector data.
# Check CRS of both datasets and adjust if necessaryif(!identical(terra::crs(world), terra::crs(temp_base))) { world <- world |> sf::st_transform(crs = sf::st_crs(temp_base))}# Collapse all into one layer by averaging across months and yearsannual_values <- terra::app(temp_base, fun = mean, na.rm =TRUE, cores =4)# Aggregate by countrycountry_values <- terra::extract( annual_values, world,fun = mean,na.rm =TRUE )# Add values to vector dataworld$temp_base <- country_values[, 2]
Now that we have the focal and baseline values, we calculate single deviations.
world <- world |> dplyr::mutate(diff_1993 = temp_1993 - temp_base,diff_2000 = temp_2000 - temp_base,diff_2010 = temp_2010 - temp_base,diff_2020 = temp_2020 - temp_base )# Plot 2020 deviation from baselineggplot(data = world) +geom_sf(aes(fill = diff_2020)) +scale_fill_viridis_c() +theme_minimal() +labs(title ="Absolute deviation between 2020 and baseline temperature",subtitle ="Averaged across countries",fill ="Temperature (K)" )
Let’s use these data
Remember, our question was whether temperature deviations from a baseline period (1961-1990) in the survey year correspond with the overall climate concern within a country. We can assess this question, e.g., by aggregating the survey data as follows. Note that we create a within-country z-standardized measure for climate concern along the way.
issp_aggregated <- issp |> dplyr::mutate(country =ifelse( country =="USA", "United States of America", country ) ) |> dplyr::group_by(country) |> dplyr::mutate(concern =scale(concern)) |> dplyr::group_by(country, year) |> dplyr::summarize(concern =mean(concern, na.rm =TRUE), .groups ="drop" )issp_aggregated
# A tibble: 102 × 3
country year concern
<chr> <dbl> <dbl>
1 Australia 1993 0.140
2 Australia 2010 -0.263
3 Australia 2020 0.227
4 Austria 2000 -0.0158
5 Austria 2010 -0.225
6 Austria 2020 0.187
7 Bulgaria 1993 0.0797
8 Bulgaria 2000 -0.0766
9 Bulgaria 2010 -0.00632
10 Canada 1993 0.0335
# ℹ 92 more rows
Structurally harmonzing the world data
Formally, the ISSP data are now in an aggregated long format. We have to wrangle our world data, including the temperature measures, to harmonize both datasets. Note that we z-standardize temperature differences across the whole period and countries this time, as they do not differ within countries in a specific year.